Exploiting Category-Specific Information for Multi-Document Summarization
نویسندگان
چکیده
We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in multi-document summarization tasks, and can be encapsulated by the concept of category-specific importance (CSI). Our experiments show that CSI is a valuable metric to aid sentence selection in extractive summarization tasks. We operationalize the computation CSI of sentences through the introduction of two new features that can be computed without needing any external knowledge. We also generalize this approach, showing that when manually-curated document-to-category mappings are unavailable, performing automatic categorization of document sets also improves summarization performance. We have incorporated these features into a simple, freely available, open-source extractive summarization system, called SWING. In the recent TAC-2011 guided summarization task, SWING outperformed all other participant summarization systems as measured by automated ROUGE measures.
منابع مشابه
SWING: Exploiting Category-Specific Information for Guided Summarization
We present our work towards building a robust multiple document summarizer (SWING), with a focus on guided summarization. SWING is an extractive summarizer built upon information retrieval principles. Our key contribution is utilizing category knowledge, collected over all news topics, to calculate category-specific importance (CSI) of sentences. We propose two new category-specific features in...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملExploiting Cross-Document Relations for Multi-document Evolving Summarization
This paper presents a methodology for summarization from multiple documents which are about a specific topic. It is based on the specification and identification of the cross-document relations that occur among textual elements within those documents. Our methodology involves the specification of the topic-specific entities, the messages conveyed for the specific entities by certain textual ele...
متن کاملGenerating Summaries Using Sentence Compression and Statistical Measures
In this paper, we propose a compression based multi-document summarization technique by incorporating word bigram probability and word co-occurrence measure. First we implemented a graph based technique to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule based syntactic constraint to prune our compressed sentences. Finally we use probabilistic me...
متن کاملA Summarization System with Categorization of Document Sets
We participated in both the single-document and multi-document summarization tasks at the TSC 2002. We have incorporated two modules into our earlier summarization system, which is based on a sentenceextraction technique, so that we could apply the system to the multi-document summarization task. One is a module to categorize document sets and the other is to estimate the similarity between sen...
متن کامل